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Abstract 

The promise of software transactional memory (STM) is to combine an easy-to-use pro- 
gramming interface with an efficient utilization of the concurrent-computing abilities pro- 
vided by modern machines. But docs this combination come with an inherent cost? 

We evaluate the cost of concurrency by measuring the amount of expensive synchroniza- 
tion that must be employed in an STM implementation that ensures positive concurrency, 
i.e., allows for concurrent transaction processing in some executions. Wc focus on two 
popular progress conditions that provide positive concurrency: progressiveness and permis- 
siveness. 

Wc show that in permissive STMs, providing a very high degree of concurrency, a trans- 
action performs a linear number of expensive synchronization patterns with respect to its 
read-set size. In contrast, progressive STMs provide a very small degree of concurrency 
but, as wc demonstrate, can be implemented using at most one expensive synchronization 
pattern per transaction. However, we show that even in progressive STMs, a transaction 
has to "protect" (e.g., by using locks or strong synchronization primitives) a linear amount 
of data with respect to its writc-sct size. Our results suggest that looking for high degrees 
of concurrency in STM implementations may bring a considerable synchronization cost. 
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1 Introduction 



The software transactional memory (STM) paradigm promises to efficiently exploit the concur- 
rency provided by modern computers while offering an easy-to-use programming interface. It 
allows a programmer to write a concurrent program as a sequence of transactions. A transaction 
is a series of read and write operations on transactional objects (or t-objects). An STM imple- 
mentation turns this series into a sequence of accesses to underlying base objects and exports 
"all-or-nothing" semantics: every transaction either commits in which case all its operations are 
expected to instantaneously "take effect" , or aborts in which case the transaction does not affect 
any other transaction. In this paper, the default STM correctness property is opacity \13\ [T5] 
that, informally, requires that in every execution, there is a total order on all transactions, 
including aborted ones, where every read operation returns the argument of the last committed 
write operation on the read t-object. 

An STM implementation that aborts every transaction is trivially correct but useless. There- 
fore, we need to specify a progress condition that captures the execution scenarios in which a 
transaction should commit. Consider, for example, a simple non-trivial progress condition that 
requires a transaction to commit if it does not overlap with any other transaction. This condi- 
tion can be implemented using a single lock that is acquired at the beginning of a transaction 
and released at its end. The resulting "single-lock" STM will be running one transaction at 
a time, thus ignoring the potential benefits of multiprocessing. Similarly, an obstruction- free 
STM |12j that only requires a transaction to commit if it eventually runs with no contention 
allows for no concurrency at all. But to exploit the power of modern multiprocessor machines, 
an STM implementation must allow at least some transactions to make progress concurrently. 
If this is the case, we say that the implementation provides positive concurrency, in contrast to 
zero concurrency provided by "single-lock" and obstruction-free STMs. 

In this paper, we try to understand the inherent costs of allowing multiple concurrent trans- 
actions to commit. Therefore, we focus on progress conditions that provide positive concurrency: 
progressiveness [U] and permissiveness . Informally, a progressive STM [T3] provides a very 
small degree of concurrency by only enforcing a transaction T to commit if it encounters no 
concurrent conflicting transaction T': T and T' conflict on a t-object X if they concurrently 
access X and one of the transactions tries to update X. A stronger variant of progressiveness, 
called strong progressiveness, additionally requires that in case a set of transactions conflict on 
at most one t-object, at least one transaction commits. A much more demanding permissive 
STM [TTj stipulates that a transaction must commit, unless committing it violates correctness, 
which, informally, provides the highest degree of concurrency. 

To understand the inherent cost of positive concurrency in STM implementations, we first 
consider the number of RAW/AWAR synchronization patterns [6j that must be performed by a 
process in the course of a transaction. A read- after- write (RAW) pattern consists of a write to 
a (shared) base object x followed by a read from a different base object y (without a write to 
y in between). An atomic write- after-read (AWAR) pattern consists of an atomic (indivisible) 
execution of a read of a base object followed by a write on (possibly the same) base object. 
Accounting for RAW/AWAR patterns is important since most modern processor architectures 
use relaxed memory models, where maintaining the order of operations in a RAW requires a 
memory fence [21] and each AWAR is manifested as an atomic instruction such as Compare- 
and-Swap (CAS). In most architectures, memory fences and atomic instructions are believed to 
be considerably slower than regular shared- memory accesses [H [191 [2H [20] . 

We show that every permissive and opaque STM implementation has, for any m € N, an 
execution in which a transaction with a read set of size m incurs J7(m) consecutive RAW/AWAR 
patterns. This contrasts with a single-lock STM that uses only one such pattern, since a 
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successful lock acquisition can be implemented using only one (multi-) RAW [IsQor AWAR [1]. 
We show that one RAW/AWAR is in fact optimal for single lock STMs. Moreover, we present 
implementations of progressive STMs that employ just a single RAW or AWAR pattern per 
transaction. Also, we describe a strongly progressive space-bounded STM implementation that 
incurs four RAWs per transaction. 

These implementations suggest that the RAW/AWAR metric is too coarse-grained to evalu- 
ate the complexity of progressive STMs. Therefore, we introduce a new metric called protected 
data size that, intuitively, captures the amount of data that a transaction must exclusively 
control at some point of its execution. All progressive STM implementations we are aware of 
(see, e.g., an overview in [T3]) use locks or timing assumptions to give an updating transaction 
exclusive access to all objects in its write set at some point of its execution. E.g., lock-based 
progressive implementations require that a transaction grabs all locks on its write set before up- 
dating the corresponding base objects. Our results show that this is an inherent price to pay for 
providing progressive concurrency: every committed transaction in a progressive and disjoint- 
access-paralle^STM implementation must, at some point of its execution, protect every object 
in its write set. Interestingly, as our progressive implementations show, the transaction's read 
set does not need to be protected. 

In brief, our results imply that providing high degrees of concurrency in opaque STM im- 
plementations incurs a considerable synchronization cost. Permissive STMs, while providing 
the best possible concurrency in theory, require a strong synchronization primitive or a mem- 
ory fence per read operation, which may result in excessively slow execution times. Progressive 
STMs provide only basic concurrency but perform considerably better in this respect: we present 
progressive implementations that incur constant RAW/AWAR complexity. Does this mean that 
maximizing the ability of processing multiple transactions in parallel should not be an important 
factor in STM design? Should we rather assume little positive concurrency provided by pro- 
gressiveness or even focus on speculative single- lock solutions a la flat combining |16j ? Difficult 
to say afhrmatively, but our results suggest so. 

The rest of the paper is organized as follows. Section [2] briefly introduces our system model 
and recalls the correctness criteria in STM. Section |3] presents some useful properties of STM 
implementations and Section |4] recalls the definitions of progress conditions of STM, including 
progressiveness and permissiveness. Section [5] presents the definitions of RAW/AWAR complex- 
ity. Sections [6] presents a linear lower bound on the number of RAW/AWAR patterns executed 
by a transaction in a permissive STM. Section [7] describes our progressive STM implementations 
that perform constant RAWs or AWARs per transaction and presents a lower bound on the 
amount of data to be protected by a transaction in a progressive STM. Section [8] summarizes 
some related work and Section [9] concludes the paper. Detailed proofs are delegated to the 
optional Appendix. 

2 Model 

Our STM model, while keeping the spirit of the original definitions of |13| I15j. introduces some 
refinements that are instrumental for our results. 

^ A multi-RAW consists of a series of writes followed by a series of reads from a distinct locations. Maintaining 
the multi-RAW order can be achieved with a single memory fence. 

■^A disjoint-access-parallel STM implementation |17l |H] guarantees that concurrent transactions accessing 
disjoint sets of transactional objects are executed independently of each other, i.e., without conflicting on the 
base objects. 
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Transactions. Transactional memory provides the ability of reading and writing to a set 
of transactional objects, or t-objects using atomic transactions. A transaction is a sequence 
of accesses (reads or writes) to t-objects. We assume that every transaction has a unique 
identifier k. Formally, STM exports the following operations (called tm- operations in the paper): 
(1) readk{X) that returns a value in a set F or a special value Ak ^ V (abort); (2) writek{X,v) 
that returns ok^ or A^; (3) tryC^ that returns ^ V {comniit)or A^ and (4) tryA^ that 
returns Af;. 

A history H is a sequence of invocations and responses of tm-operations. A history H 
is sequential if every invocation is either the last event in H or is immediately followed by a 
matching response. H\k denotes the subsequence of H restricted to events with index k. If H\k 
is non-empty we say that participates in H, and parts {H) denotes the set of transactions 
that participate in H. A history is well-formed if for all T^, H\k is sequential and contains no 
events that appear after A^ or Ck- Throughout this paper, we assume that all histories are well- 
formed, i.e., the user of transactional memory never invokes a new operation before receiving 
a response from the current one and does not invoke any operation opk after has returned 
Ck or Ak- A history H is complete if for every Tk G parts{H), H\k ends with a response event. 
A transaction € parts{H) is live in H if H\k does not end with A^ or C^. Otherwise, 
is called com.plete. A history is t-complete if parts{H) contains only complete transactions. A 
transaction G parts{H) is forcefully aborted in H if some operation op^ / tryAj^ returns A^. 
Two histories H and H' are equivalent if for every transaction T^, H\k = H'\k. 

The read set (resp., the write set) of a transaction G parts{H), denoted Rset{Tk) (resp., 
Wset{Tk)), is the set of t-objects that Tk reads (resp., writes to) in H. Dset{Tk) = Rset(Tk) U 
Wset{Tf;) is called the data set oi T^. A transaction is called read-only if Wset{Tk) = 0, 
otherwise, it is called updating. 

Real-time and deferred- update orders. For T^^Tm G parts{H), wc say that precedes 
Tm in the real-time order in H, and we write T^ -<h Tm, if Tfc is committed or aborted and the 
last event of T^ precedes the first event of Tm in H. If neither T^ <h Tm nor Tm Tfc, then we 
say that T^ and Tm are concurrent in H. A transaction T^ G parts{H) which is not concurrent 
with any other transaction in H is called uncontended in H. A history H is t-sequential if no 
two transactions are concurrent in H. 

For Tk,Tm G parts{H), we say that T^ precedes Tm in the deferred-update order, and we 
write Tk Tm if there exists X G Rset{Tk) fl Wset{Tm), Tm has committed, such that the 
response of readk{X) precedes the invocation of tryCmO in H. For Tk,Tm G parts{H), wc write 
Tk^jjTm, if Tk has committed and the response of readm{X), X G RsetiTm) n Wset{Tk) returns 
V, the value of X updated in writek{X,v). 

Legal histories. Let H be a complete t-sequential history. For every operation readk{X) in 
H that reads a t-object X, wc define the latest written value of X as follows: (1) If Tk contains 
a writek{X,v) preceding readk{X) then the latest written value of X is the value of the latest 
such write. (2) Otherwise, if H contains a writem{X,v) such that m ^ k, Tm precedes Tk, 
and Tm commits in iJ, then the latest written value of X is the value of the latest such write 
in H. (3) Otherwise, the latest written value of X is the initial value of X. Without loss of 
generality, we assume that H starts with a fictitious initializing transaction Tq that writes to 
every t-object. We say that a complete t-sequential history H is legal if for every t-object X, 
every read of X in returns the latest written value of X. 

Opacity. Let H be any complete sequential history. Now H denotes a history constructed 
from H as follows: (1) For every live transaction Tk in if, we insert tryCk ■ Ak immediately after 
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the last event of in H and (2) For every aborted transaction in H, we remove all write 
operations in with the matching responses. 

Definition 1 A complete sequential history H is opaque if there exists a legal complete t- 
sequential history S such that (1) H and S are equivalent and (2) S respects <h o-nd ■ 

We call such a legal complete t-sequential history S a serialization of H. A weaker property, 
called strict serializability [22], guarantees opacity with respect to committed transactions in 
H. Obviously, every opaque history is also strictly serializable. 



Implementations. We consider an asynchronous shared-memory system in which processes 
Pi, ■ ■ -Pn communicate by executing atomic operations on shared base objects. 

An STM implementation provides the processes with algorithms for operations readk, writck, 
tryC^ and tryAj.. Without loss of generality, we assume that base objects are accessed with 
atomic read-write operations, but we allow the programmer to aggregate a sequence of op- 
erations on base objects using clearly demarcated atomic sections: the operations within an 
atomic section are to be executed sequentially. The atomic-section construct is general enough 
to implement various strong synchronization primitives, such as test-and-set (TAS) or compare- 
and-swap (CAS). We assume that atomic sections may only contain a bounded number of 
base-object operations. 

An execution of an implementation M is a sequence of atomic accesses to base objects 
{base-object events), and invocation and responses of the TM operations (TM-events). If a 
base-object event is a write or an atomic-section that contains a write (in one of its execution 
paths), we say that the event is non-trivial. 

A configuration of M (after some execution E) is determined by the states of all base objects 
and the states of the processes. An initial state of M is determined by the initial states of base 
objects and t-objects. We assume that each base object and each t-object is initialized to 0. A 
history of an execution E, denoted by E\tm is the subsequence of E restricted to TM-events. 
E\tm,Pi denotes the subsequence of E\tm restricted to events issued by process pi. 

The interval of a transaction Tf^ in E is the fragment of E that starts with the first event of 
Tk in E and ends with the completing event of (A^ or Ck) in E, or, if has not completed 
in E, with the last event of E. A tm-operation opi precedes op2 in H if the invocation of op2 
appears after the response of opi in H. An execution E is well-formed if every atomic section 
is executed sequentially in E, E\TM,pi is t-sequential for each pi, and no event on behalf of a 
transaction is taking place outside of an interval between invocation and response of some 
TM-operation in T^. We assume here that a TM implementation generates only well- formed 
executions. 

A completion of -ff is a history constructed from H by removing some pending invocations 
and adding responses to the remaining pending invocations to the end of H. To account for 
initial values of t-objects, we add to the beginning oi H a (fictitious) transaction Tq that writes 
to every t-object and commits. 

A complete sequential history H' is a linearization of H if there exists a history H" , a 
completion of H, such that (1) H' respects the precedence order of H, and (2) H' and H" are 
equivalent. 

Definition 2 An STM implementation M is opaque if for every execution E of M , there exists 
an opaque linearization of E\tm- 
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3 Preliminaries 



In this section, we define some useful properties of STM implementations and prove some simple 
facts that follow from these definitions. 

Access patterns. The definition of STM allows a process to alternate reading and writing to 
t-objects arbitrarily in the course of a transaction. Moreover, it allows a process to read from a 
t-object that was previously written within the same transaction. We show that this flexibility 
can be obtained "for free" given an implementation that only allows a user to read from a set 
of t-object and then to write to a set of t-objects within a transaction. 

We say that a transaction is canonic in a history H \i H\k consists is a sequence of 
reads (of distinct t-objects) followed by a sequence of writes (to distinct t-objects). A general 
complexity of an STM implementation M accounts for the number of accesses to base-objects 
used to implement every given transaction in every execution of M. 

Lemma 3 Let M he an opaque STM implementation that can only he accessed with canonic 
transactions. Then there exists an opaque STM implementation M' that preserves the complexity 
OfM. 

Proof. Let read^'^ , write^^ , tryC^^ and tryA^^ denote the implementations of the operations 
provided by M. Now M' is constructed as follows. 

We associate every transaction T^. with a local variable Wset{Tk) which contains, at any 
moment of time, the current write set of Tk with the values to be written. 

When writek{X,v) is invoked, {X,v) is simply added to Wset{Tk) and all other entries of 
the form {X,v') are removed from Wset{Tk). When readk{X) is invoked, we first check if X is 
in Wset{Tk) and if so, we return the value stored in Wset{Tk). Otherwise, we invoke read^\X) 
and return the obtained value. 

When tryCj^i) is invoked, we first execute writek{X,v) for each {X,v) G Wset^. Since for 
each X there can be at most one entry of the form (A, v), the order in which these operations are 
invoked does not matter. Also, since all invocations of writck succeed all invocations of readk, 
the resulting sequence of invocations of M on behalf of Tf^ is a canonic transaction. Operation 
tryAf^O is implemented as tryA^\). 

Since M is opaque, the resulting implementation is also opaque: just use the serialization 
of the resulting history of M. Since the modifications of M involve only local variables, the 
base-object complexity of M' is the same as that of M. □ 

Therefore, in the rest of the paper, we only consider canonic transactions, which simplifies the 
analysis without sacrificing generality. 

Disjoint-access parallelism. In STM implementations, it is considered important to allow 
transactions that are not related through their data sets that they access to execute indepen- 
dently. 

Let / be a fragment of an execution E. Following [17^ i8j, we first define a conflict graph 
which relates transactions that are live in I. Vertices of the graph represent t-objects. The 
vertices representing distinct t-objects X and Y are related with an edge if and only if there is 
a transaction T such that {A, Y} C Dset(T) and the interval of T overlaps with I in E. 

Two transactions Tj and Tj are disjoint-access in E if there is no path between an item 
in Dset(Ti) and an item in Dset{Tj) in the confiict graph of the minimal execution interval 
containing the intervals of Tj and Tj . 

Two processes concurrently contend on a base-ohject x in a given configuration if they have 
pending events on x in the configuration and one of these events is non-trivial. 
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Definition 4 An STM implementation M is disjoint- access parallel (DAP) if, for all execu- 
tions E of M , two processes executing Ti and Tj concurrently contend on the same base object 
in E only if Ti and Tj are not disjoint-access. 

The following lemma is inspired by [8]: 

Lemma 5 Let E be an execution of a DAP STM implementation M in which a complete 
execution ofTi is immediately followed by a (possibly incomplete) execution 0/T2 such that Ti 
and T2 are disjoint-access. Then there does not exist a base object x such that both processes 
executing Ti and T2 access x in E and one of the accessing events is non-trivial. 

Proof. Let Eq ■ Ei ■ E2 be the prefix of E, where Ei is the fragment of E consisting of the 
complete execution of Ti , and E2 is the fragment of E consisting of the execution of T2 . 

Suppose, by contradiction, that Ti writes to a base object that is accessed by T2. Let E2 be 
the longest fragment of E2 that does not contain the first event on an object that is accessed by 
Ti in El with a non-trivial event, and let x be this base object. Let E[ be the longest prefix of 
Ti that does not contain the first non-trivial event on x. Since before accessing x, Ti does not 
observe T2, Eq ■ E[ ■ E2 is an execution of M. By construction Ti and T2 are disjoint-access in 
Eq ■ E[ ■ E2. But in the resulting configuration, the processes executing Ti and T2 concurrently 
contend on x — a contradiction. 

Now suppose that T2 writes to a base object that is accessed by Ti. Let £"2 be the longest 
prefix of E2 that does not contain the first non-trivial event of T2 on a base object that is 
accessed by Ti in Ei, and let y be this object. Let E'{ be the longest prefix of Ei that does not 
contain the first event on x in i^i. Since, as we showed above, T2 does not observe the presence 
of Ti in E, Eq ■ E'l ■ E2 is an execution of M. But, again, we obtained a configuration in which 
the processes executing Ti and T2 concurrently contend on x — a contradiction. □ 



Definition 6 An STM implementation M provides strict data partitioning if every t-object 
X is associated with a set of base object P{X) such that VX 7^ Y , /3(X) H fHY) = and a 
transaction Ti can access a base object in P{X) only if X £ Dset(Ti). 

Any STM that provides strict data partitioning is also disjoint-access parallel (but not vice 
versa) . 

Invisible reads and single-version opacity. An STM implementation M uses invisible 
reads if no execution of a tm-read operation incurs a write on a base object. 

Let H he a, sequential history. We say that Ti precedes Tj in H in the single-version order, 
and we write T Tj if there exists X G Wset(Ti) Ci Rset{Tj) such that tryCi precedes 

readj{X) in H. 

A sequential history H is single-version opaque if there exists a legal t-sequential history H' 
such that: 

1. H and H' are equivalent; 

2. H' respects -^h and ^'^^ and 

3. H' respects ^fj^ ■ 

Now an STM implementation M is single-version opaque if for every execution E of M, 
there exists an opaque single- version linearization oi E\tm- Intuitively a single- version opaque 
implementation is opaque and maintains exactly one copy of a t-object's state at any given 
moment. 
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4 Liveness and Progress 



To describe the conditions under which a TM implementation does something useful, we need 
to address two orthogonal dimensions. First, we need to give a tm-liveness property [3] that 
determines the conditions under which an individual tm-operation must return. Second, we 
need to give a progress condition that describes the cases in which a transaction must commit. 

4.1 TM-liveness properties 

A TM implementation M is wait-free if in every infinite execution of M, each tm-operation 
returns in a finite number of its own steps, regardless of the behavior of concurrent transactions. 
In other words, a wait-free individual tm-operation (read, write, tryC or tryA) cannot be 
delayed because of a concurrent operation. The property can be very beneficial if executions of 
transactions are subject to unpredictable delays or failures. 

In this paper, we do not assume failures: every operation is expected to take steps until it 
terminates. Moreover, we are interested in deriving inherent costs of implementing non-trivial 
concurrency in TM. Therefore, we assume a weaker default tm-liveness guarantee, that we 
call starvation-freedom. A TM implementation M is starvation-free in every infinite execution 
of M, each tm-operation eventually returns, assuming that no concurrent tm-operation stops 
indefinitely before returning. Starvation-freedom allows a tm-operation to be delayed only by 
a concurrent tm-operation. 

4.2 Progress conditions 

A progress condition determines the scenarios in which a transaction is allowed to abort. Tech- 
nically, unlike tm-liveness, a progress condition is a safety property [3], since it can be violated 
in a finite execution. The simplest non-trivial progress property we consider in this paper is 
single-lock progressiveness that says that a transaction can only abort if there is a concurrent 
transaction. Clearly, an opaque single-lock TM can be implemented using any mutual exclusion 
algorithm |24j with one critical section per transaction. Stronger progress conditions allow some 
transactions to progress concurrently in some scenarios implying positive concurrency 

Progressiveness allows an implementation to abort a transaction only in case of a conflict. 
Transactions Ti,Tj conflict in a history H on a, t-object X if Tj and Tj are concurrent in H, 
X G Dset{Ti) n Dset{Tj), and X G Wset{Ti) U Wset{Tj). 

Definition 7 A TM implementation M is (weakly) progressive if for every history H of M 
and every transaction Ti £ parts{H) that is forcefully aborted, there exists a prefix H' of H and 
a transaction Tk G parts{H') that is live in H' , such that Tk and Ti conflict in H' . 

The strong progressiveness property [Hj additionally requires that in case of a set of transactions 
conflict on a single t-object at least one transaction commits. The formal deflnition is inspired 
from |15j . 

Let CObjniTi) denote the set of t-objects over which transaction Tj G parts{H) conflicts 
with any other transaction in history H i.e. X G CObjniTi) if there exists a transaction 
Tk G parts{H), k ^ i, such that Tj conflicts with Tk on X in H. Then, CObjniQ) = 
{CObjH{Ti)\\/Ti G Q}, denotes the union of sets CObjH(Ti) for all transactions in Q. 

Let CTrans{H) denote the set of non-empty subsets of parts{H) such that a set Q is in 
CTrans{H) if no transaction in Q conflicts with a transaction not in Q. 

^This does not include transactions that guarantee obstruction-freedom [T^ 
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Definition 8 A TM implementation M is strongly progressive if there does not exist any 
history H of M in which for every set Q G CTrans{H) of transactions such that \CObjH{Q)\ < 
1, every transaction in Q is forcefully aborted in H . 

But since the goal of tliis paper is to derive a lower bound, we consider weak progressive 
implementations (from now on — simply progressive). 

Let C be any correctness property, i.e., any safety property on TM histories [3]. The 
following property guarantees that no transaction is forcefully aborted if there is a chance of to 
commit the transaction and preserve correctness. 

Definition 9 A TM implementation M is permissive with respect to C if for every history H 
of M such that H ends with a response r^ and replacing r^ with some ^ A^ gives a history 
that satisfies C , we have r^ 7^ A^. 

Therefore, permissiveness does not allow a transaction to abort, unless committing it would vio- 
late the execution's correctness. In this paper, we consider TM implementations that are permis- 
sive with respect to opacity. Clearly, permissiveness with respect to opacity is strictly stronger 
than progressiveness: every permissive opaque implementation is also progressive opaque, but 
not vice versa. 

A transaction in a permissive opaque implementation can only be forcefully aborted if it 
tries to commit: 

Lemma 10 Let a TM implementation M be permissive with respect to opacity. If a transaction 
Ti is forcefully aborted executing an operation opi, then opi is tryC^. 

Proof. Suppose, by contradiction, that there exists a history H oi M such that some opi G 
{readi, writa} executed within a transaction Ti returns Ai. Let Ho be the shortest prefix of H 
that ends just before opi returns. By definition, Hq is opaque and any history Hq ■ ri where 
Tj 7^ Ai is not opaque. Let Hq be the serialization of -ffo- 

If opi is a write, then H^-oki is also opaque - no write operation of the incomplete transaction 
Ti appears in Hq and, thus, Hq is also a serialization of Hq ■ olq. 

If opi is a read{X) for some t-object X, then we can construct a serialization of Hq ■ v where 
V is the value of X written by the last committed transaction in Hq preceding Tj or the initial 
value of X if there is no such transaction. It is easy to see that //q" obtained from Hq by adding 
read{X) ■ v at the end of Tj is a serialization of Hq ■ read{X). In both cases, there exists a non-Aj 
response to opi that preserves opacity of Hq ■ ri, and, thus, the only operation that can be 
forcefully aborted in an execution of M is tryC. □ 



Obviously, Lemma 10 implies that there does not exist a permissive single- version TM imple- 
mentation. 



Mult i- version permissiveness. A relaxation of permissiveness, called multi-version permis- 
siveness (or mv-permissiveness) |23j says that a transaction Tj can only abort if Ti is updating 
and there is a concurrent conflicting updating transaction Tj i.e. a read-only transaction cannot 
be aborted. 

Lemma 11 There does not exist a mv-permissive TM implementation M that guarantees (wait- 
freedom) starvation- freedom of individual tm-operations and single-version opacity. 

Proof. By contradiction, suppose that there exists a single-version opaque mv-permissive 
implementation M. Consider an execution of M in which transaction Ti sequentially reads X, 
then transaction T2 writes to X and Y and commits. Such an execution exists, since none of 
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these operations can be forcefully aborted in a mv-permissive implementation. Now we extend 
this history with Ti reading Y. There is no way to serialize Ti and T2 preserving single-version 
opacity, unless readi{Y) aborts. But a mv-permissive TM implementation does not allow a 
read-only transaction to return abort — a contradiction. □ 

If we relax our tm-liveness property and allow a tm-operation to be delayed by a concurrent 
conflicting transaction, then a single- version mv-permissive implementation is possible [7]. 

Probabilistic permissiveness. Intuitively, a probabilistic permissive TM ensures the prop- 
erty of Definition [9] with a positive probability. It is conjectured in [llj that probabilistically 
permissive (with respect to opacity) implementations can be considerably cheaper than deter- 
ministic ones. This is achieved by choosing the response to a tm-operation op^ by sampling 
uniformly at random from the set of possible return values (including A^). 

Definition 12 A TM implementation M is permissive with respect to C if for every history 
H of M such that H ends with a response r}^ and replacing rj^ with some r^ 7^ A^ gives a history 
that satisfies C , we have r^ 7^ A^ with positive probability. 

5 RAW/AWAR complexity 

Modern CPU architectures perform reordering of memory references for better performance. 
Hence, memory barriers/fences are needed to enforce ordering in synchronization primitives 
whose correct operation depends on ordered memory references. Attiya et al. [6] formalized 
the RAW/AWAR class of synchronization patterns and showed that a wide class of concurrent 
algorithm implementations must involve these expensive patterns. We recall the definitions 
below. 

Let TT be an execution fragment and let vTj denote the i-th event in vr (i = 0, . . . , |7r| — 1). We 
say that process p performs a RAW (read-after-write) in vr if 3i,j; < i < j < |7r| such that 

• vTj is a write to a base object x by process p, 

• TTj is a read of a base object y 7^ x by process p and 

• there is no TTk such that i < k < j and vr^ is a write to y by p. 

We say that two RAWs by process p overlap in an execution E with the read event of the first 
RAW occurs after the write event of the second RAW. A multi-RAW consists of series of writes 
to a set of base objects followed by a series of reads from different base objects. 

We say a process p performs an AWAR (atomic-write-after-read) in tt if 3i, j, < i < j < |7r| 
such that 

• TTj is a read of a base-object x by process p, 

• TTj is a write to a base-object y by process p and 

• TTj and TTj belong to the same atomic section. 
Examples of AWAR are CAS and mCAS. 
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Figure 1: Execution i? of a permissive, opaque STM: T2 and force Ti to perform a RAW/AWAR in 
each Ri{Xk), 2 < k < m 

6 RAW/AWAR cost of permissive STMs 

In this section, we show that an execution of a transaction in a permissive STM implementation 
may require to perform at least one RAW/AWAR pattern per tm-read. 

Let M be a permissive, opaque TM implementation. Consider an execution E of M with 
a history H consisting of transactions Ti, T2, T3 as shown in Figure [TJ T3 performs a read of 
Xi, then T2 performs a write on Xi and commits, and finally Ti performs a series of reads 
from objects Xi,...,Xm- Here, Rk{X), Wk{X,v) denote complete executions of readf^{X) 
and writek{X,v) respectively. Since the implementation is permissive, no transaction can be 
forcefully aborted in E, and the only valid serialization of this execution is T3, T2, Ti. Note 
also that the execution generates a sequential history: each invocation of a tm-operation is 
immediately followed by a matching response in H. Thus, since we assume starvation-freedom 
as a liveness property, such an execution exists. 

Imagine that we modify the execution E as follows. Immediately after Ri{Xk) executed by 
Ti we add W3{X,v), and tryC^ executed by T3 (let TC3(Xfc) denote the complete execution 
of W3{Xk,v) followed by tryC^). Obviously, TC^{Xk) must return abort: neither T3 can be 
serialized before Ti nor Ti can be serialized before T3. On the other hand if TC^{X]f) takes 
place just before then TC^{Xk) must return commit but Ri{Xk) must return the value 

written by T3. In other words, Ri{Xk) and TC^{Xi^) are strongly non- commutative j6j: both 
of them see the difference when ordered differently. As a result, intuitively, Ri{Xk) needs to 
perform a RAW or AWAR to make sure that the order of these two "conflicting" operations is 
properly maintained. A formal proof follows. 

Theorem 13 Let M he a permissive opaque STM implementation. Then, for any m G N, M 
has an execution in which some transaction performs m tm-reads such that the execution of 
each tm-read contains at least one RAW or AWAR. 

Proof. We consider 2 < k < m in execution E. 

Imagine a modification E' of E, in which T3 performs 1^/3 (X^) immediately after Ri{Xk) 
and then tries to commit. A serialization oi H' = E'\tm should obey T'j, T2 and T2 -<h' Ti. 
The execution of Ri{Xk) does not modify base objects, hence, T3 does not observe Ri{Xk) in 
E' . Since M is permissive, T3 must commit in E' . But since Ti performs before T3 

commits and T3 updates X^, we also have Ti ■<^V T3. Thus, T'^ cannot precede Ti in any 
serialization — contradiction. Consequently, each Ri{Xk) must perform a write to a base object. 

Let TT be a fragment of E that represents the complete execution of Ri{Xk). Clearly, vr 
contains a write to a base object. Let Hj be the first write to a base object in vr and vr^, the 
shortest fragment of vr that contains the atomic section to which vrj belongs, else if tTj is not 
part of an atomic section, iTyj = ttj. Thus, vr can be represented as tTs ■ tt^ ■ vrj. 

Suppose that vr does not contain a RAW or AWAR. Since tt^ does not contain an AWAR, 
there are no read events in vr^ that precede ttj. Thus, ttj is the first base object event in 
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TTyij. Consider the execution fragment vr^ • p, where p is the complete execution of TCz{X}S) by 
T3. Such an execution exists since tt^ does not perform any base object write, hence, tt^ • p is 
indistinguishable to from p. 

Since, by our assumption, 7r«, • tt/ contains no RAW, any read performed in 7r«, • tt/ can 
only be applied to base objects previously written in tt^ • tt/. Thus, there exists an execution 
• p • TT^ • TTj that is indistinguishable to Ti from vr. In tTs • p • vr^ • ttj, T3 commits (as in p) 
but Ti ignores the value written by to X^. But Ta, T2, Ti is the only valid serialization for 
E\tm — contradiction. Thus, each 2 < k < m must contain a RAW/AWAR. 

Note that since all tm- reads of Ti are executed sequentially, all these RAW/AWAR patterns 
are pairwise non-overlapping. □ 



7 RAW/AWAR cost and protected data in progressive STMs 

In this section, we first describe our progressive STM implementations that perform at most 
one RAW/AWAR per transaction. Then we present a lower bound on the amount of data to 
be protected by a transaction in a progressive STM. 

7.1 Constant RAW/AWAR implementations for progressive STM 

We start with showing that even a single-lock progressive STM cannot avoid performing one 
RAW/AWARs per transaction in some executions. 

Theorem 14 Let M be a single-lock progressive opaque STM implementation. Then every 
execution of M in which an uncontended transaction performs at least one read and at least one 
write contains a RAW/AWAR pattern. 

Proof. Consider an execution vr of M in which an uncontended transaction Ti performs (among 
other events) readi{X), writei{Y,v) and tryCii). Since M is single-lock progressive, Ti must 
commit in tt. Clearly tt must contain a write to a base object. Otherwise a subsequent trans- 
action reading Y would return the initial value of Y instead of the value written by Ti . 

Let TTj be the first write to a base object in vr and let 'Kw denote the shortest fragment of 
TT that contains the atomic section to which TTj belongs {tTw = T^j if T^j is not part of an atomic 
section). Thus, tt can be represented as tTs • 7r«; • tt/. 

Now suppose, by contradiction, that tt contains neither RAW nor AWAR patterns. Since 

contains no AWAR, there are no read events in vr^ that precede TTj. Since TTj is the first 
write event in vr, it follows that TTj is the first base-object event in vr^. 

Since tTs contains no writes, the states of base objects in the initial configuration and in the 
configuration after tTj is performed are the same. Consider an execution tt^ • p where in p, a 
transaction T2 performs read2{Y), write2{X, 1), tryC2{) and commits. Such an execution exists, 
since p is indistinguishable to T2 from an execution in which T2 is uncontended and thus T2 
cannot be forcefully aborted in tTs • p. 

Since tTw ■ tt/ contains no RAWs, every read performed in ttw ■ tt/ is applied to base objects 
which were previously written in 7r„, • ttj. Thus, there exists an execution vr^ • p • tTw • tt/, such 
that Ti cannot distinguish tTs ■ ■ T^f and tt^ • p • tt^ • tt/. Hence, Ti commits in tTs • p • 7r«; • tt/. 

But both Ti reads the initial value of X and T2 reads the initial value of 1" in tt^ • p • tt^ • tt/, 
and thus Ti and T2 cannot be both committed (at least one of the committed transactions must 
read the value written by the other) — a contradiction. 

The proof is analogous in the case when an execution of Ti extends any execution ttq that 
contains only complete transactions. □ 



11 



Since every progressive or permissive STM implementation is also single-lock progressive, the 



RAW/AWAR lower bound of Theorem 14 also holds for progressive and permissive STM im- 
plementations. The lower bound is actually tight, and we sketch two progressive opaque imple- 
mentations. Both implementations are strict data-partitioned [15] (split the set of base objects 
used into disjoint subsets, each subset storing information of only a single t-object) and single- 
version (maintain exactly one copy of a t-object 's state at a time). They also use invisible reads, 
i.e., no execution of a tm-read operation performs a write to a base object. 

Our first implementation employs a mCAS primitiv^and works, in brief, as follows. Every 
t-object Xi is associated with a distinct base object Vi that stores the "most recent" value of Xi 
together with the id of the transaction that was the last to update Xi . Each time a transaction 
Tfc performs a read of a t-object Xi, it reads Vi, adds Xi to its read set and checks if the t-objects 
in the current read set of have not been updated since has read them. If this is not the 
case the transaction is forcefully aborted. Otherwise, returns the value read in Vi. Each time 
Tk performs a write to a t-object Xi, it adds Xi to its write set and returns ok. 

For every updating transaction T^, tryCf^{) invokes the mCAS primitive over DsetiTk). If 
the mCAS returns true, tryC^Q returns Ck, otherwise it returns Ak- Clearly, if Tk is forcefully 
aborted, then the execution of mCAS involved no AWAR (no write to a base object took place). 
Read-only transactions simply returns Ck- Consequently, the implementation incurs a single 
AWAR per updating committed transaction. 

Theorem 15 There exists a progressive opaque STM implementation with wait- free operations 
that employs exactly one AWAR per transaction. Moreover, no AWARs are performed in read- 
only or aborted transactions. 

Even if we do not use atomic sections (and, thus, AWARs) we still can implement a progressive 
opaque STM using reads and writes that incurs only a single multi-RAW (and, thus, incurring 
just a single fence) per update transaction. This implementation uses a simple multi-trylock 
primitive, which in turn can be implemented with a single multi-RAW. The multi-trylock prim- 
itive exports operations acquire{W), releaseiW) and isContended{X) , for all sets of t-objects 
W and all t-objects X. Informally, if there is no contention on the locks on objects in W , then 
acquire{W) returns true which means that exclusive locks on all objects in W are acquired. 
Otherwise, acquire{W) returns false which means that no locks on objects in W are acquired. 
Operation release{W) releases the acquired locks on objects in W and isContended{X) returns 
true iff a lock on X is currently held by any process. The implementation of acquire(W) first 
writes to a series of base objects and then reads a series of base objects incurring a single 
multi-RAW, while operations release{W) and isContended{X) incur no RAW. 

Implementations of reads and writes are similar to ones described above, except that each 
time a transaction Tk performs a read of a t-object Xi, it additionally checks if no object in the 
current read set is locked by an updating transaction. If some object in the read set has been 
modified or is locked, the transaction is forcefully aborted. 

For every updating transaction Tk, tryCk{) invokes acquire{Wset{Tk)). If it returns true, 
tryCkO returns Ck, otherwise it returns Ak- Read-only transactions simply returns Ck- Conse- 
quently, the implementation incurs a single multi-RAW per updating transaction. 

Theorem 16 There exists a progressive opaque STM implementation with wait-free operations 
that employs a single multi-RAW per transaction. Moreover, no RAWs are performed in read- 
only transactions. 



* In mCAS{V, OV, NV) 5 , executed atomically, a process reads an array V oi m objects V, and if for each i, 
V[i] = Oy[j], it replaces each V[i] with A'^y[i] and returns true, otherwise it returns false and leaves the objects 
unchanged. 
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We also derive a strongly progressive STM using only reads and writes that incurs at most 
four RAWs per updating transaction and uses a finite number of bounded registers. Our im- 
plementation uses a starvation-free multi-trylock primitive inspired by the Black- White Bakery 
Algorithm [25], a bounded version of the Bakery Algorithm |18j . 

Informally, if no concurrent process contends infinitely long on some X G W , then the 
acquire{W) operation of the starvation-free multi-trylock eventually returns true which means 
that exclusive locks on all objects in W are acquired. The implementation of acquire{W) incurs 
three RAWs, while operation releaseiW) performs a single RAW. 

Implementations of tm-reads and tm-writes are identical to the constant RAW progressive 
implementation described above. For every updating transaction T^,, tryCf^{) invokes the acquire 
operation of the starvation- free multi-trylock over Wset(Tk). Note that this always returns true 
and a transaction with Rset^ = eventually returns C^. Read-only transactions simply 
returns Cfc. Consequently, the implementation incurs four RAWs per updating transaction. 

Theorem 17 There exists a strongly progressive single-version opaque STM implementation 
with starvation-free operations that uses invisible reads and employs four RAWs per transaction. 
Moreover, no RAWs are performed in read-only transactions. 

Note that our implementation does not violate the impossibility result of Guerraoui and Ka- 
palka [I5j who proved that a strongly progressive opaque STM cannot be implemented using 
only reads and writes if tm-operations are required to be wait-free. 



7.2 Protected data 

Let M be a progressive STM implementation. Intuitively, a t-object Xj is protected at the end 
of some finite execution vr of M if some transaction Tq is about to atomically change the value 
of Xj in its next step (e.g., by performing a CAS operation) or does not allow any concurrent 
transaction to read Xj (e.g., by holding a lock on Xj). 

Formally, let a • tt be an execution of M such that vr is an uncontended complete execution 
of a transaction Tq, where WsetiTo) = {Xi, . . . , Xm}- Let Uj {j = 1, . . . ,m) denote the value 
written by Tq to t-object Xj in vr. We say that vr' is a proper prefix of vr if vr' is a prefix of vr and 
every atomic section is complete in tt'. In this section, let vr* denote the t-th shortest proper 
prefix of vr. Let vr^ denote the empty prefix. (Recall that an atomic event is either a tm-event, 
a read or write on a base object, or an atomic section.) 

For any Xj G Wset{TQ), let Tj denote a transaction that tries to read Xj and commit. Let 
Ej = a • vr* • yO*- denote the extension of a • vr* in which Tj runs solo until it completes. Note that, 
since we only require the implementation to be starvation-free, /?*■ can be infinite. 

We say that q • vr* is (1, j)-valent if the read operation performed by Tj in a • vr* • p*- returns 
Uj (the value written by Tq to Xj). We say that a • vr* is (0, j)-valent if the read operation 
performed by Tj in q • vr* • />*■ does not abort and returns an "old" value u ^ Uj. Otherwise, if 
the read operation of Tj aborts or never returns in a • vr* • p*-, we say that q • vr* is (_L, j)-valent. 

Definition 18 We say that Tq protects an object Xj in a ■ vr*, where vr* is the t-th shortest 
proper prefix of tt (t > 0) if one of the following conditions holds: (1) a ■ is {0, j)-valent and 
a ■ vr*^"*^ is {1, j)-valent, or (2) q • vr* or a ■ vr*^"^ is j)-valent. 

For disjoint-access parallel (DAP) progressive STM, we show that every uncontended transac- 
tion must protect every object in its write set at some point of its execution. 

We observe that the no prefix of vr can be and 1-valent at the same time (notations used 



are the same as introduced in Section 7.2). 
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Lemma 19 There does not exist vr*, a proper prefix of tt, and i, j E {1, . . . , m} such that a ■ tt* 
is both {0,i)-valent and {1, j)-valent. 

Proof. By contradiction, suppose that there exist i,j and a ■ vr* that is both (0,i)-valent and 
(1, j)-valent. Since the implementation is DAP, by Lemma [sj there exists an execution of M, 
E\- = a ■ TT* ■ p^j ■ pI that is indistinguishable to Tj from a • vr* • p*. In Ej^, the only possible 
serialization is Tq, Tj, Ti. But Tj returns the "old" value of Xi and, thus, the serialization is 
not legal — a contradiction. □ 

If a • vr* is (0, i)-valent (resp., (1, z)-valent) for some i, we say that it is 0-valent (resp., 1-valent). 
By Lemma [TqJ the notions of 0- valence and 1-valence are well-defined. 



Theorem 20 Let M he a progressive, opaque and disjoint-access-parallel STM implementation. 
Let a ■ IT be an execution of M , where vr is an uncontended complete execution of a transaction 
Tq. Then there exists vr*, a proper prefix ofir, such that Tq protects \ Wset{TQ)\ t-objects in a-vr*. 

Proof. Let Wsetxg = {Xi, . . . ,Xm}- Consider two cases: 

(1) Suppose that vr has a proper prefix vr* such that a • vr* is 0-valent and a ■ vr*+^ is 1-valent. 
By Lemma 19, there does not exists i, such that a • vr* is (l,z)-valent and a ■ vr*"*"^ is 
(0, i)-valent. Thus, one of the following are true 

• For every i £ {1, . . . , m}, a • vr* is (0, i)-valent and a ■ n*'^^ is (1, z)-valent 

• At least one of a • vr* and a ■ vr*~^^ is (_L, i)-valent i.e. the operation of Tj aborts or 
never returns 

In either case, Tq protects m t-objects in a ■ vr*. 

(2) Now suppose that such vr* does not exists, i.e., there is no i G {1, . . . , m} and t G {0, |vr| — 1} 
such that Ej exists and returns an old value, and Ej'^^ exists and returns a new value. 

Suppose there exists s, t, < s -|- 1 < t, S C {1, . . . , m}, such that: 

• a • vr'^ is 0-valent, 

• a • vr* is 1-valent, 

• for all r, s < r < t, and for all i £ S, a ■ vr'' is (_L, i)-valent. 

We say that s -|- 1, . . . , t — 1 is a protecting fragment for t-objects G S}. 

Since M is opaque and progressive, a • vr^ = a is 0-valent and a • vr is 1-valent. Thus, the 
assumption of Case (2) implies that for each Xi, there exists a protecting fragment for 
{Xi}. In particular, there exists a protecting fragment for {Xi}. 

Now we proceed by induction. Let vr^+i, . . . , nt-i be a protecting fragment for {Xi, . . . , Xu-i} 
such that u < m. 

Now we claim that there must be a subfragment of s-|-l, . . . , t—1 that protects {Xi, . . . , X^}. 

Suppose not. Thus, there exists r, s < r < t, such that a • vr^ is (0, ■u)-valent or (l,n)- 
valent. Suppose first that a • vr*^ is (1, u)-valent. Since a • vr'^ is (0, i)-valent for some i ^ u, 
by Lemma 19 and the assumption of Case (2), there must exist s',t', s < s' + 1 < t' < r 
such that 

• a • vr*' is 0-valent, 



14 



• a • vr*' is 1-valent, 

• for all r', s' < r' < t' , a ■ vr'"' is (_L, u)-valent. 

As a result, s' + 1, . . . , t' — 1 is a protecting fragment for {Xi, . . . , Xu}- The case when 
a • vr'" is (0, u)-valent is symmetric, except that now we should consider fragment r, . . . ,t 
instead of s, . . . , r. 

Thus, there exists a subfragment ofs + l,...,t — 1 that protects {Xi, . . . , Xu}- By in- 
duction, we obtain a protecting fragment s" + 1, . . . , i" — 1 for {Xi, . . . , Xm}- Thus, any 
prefix a ■ TT^ , where s" < r < t" protects exactly m t-objects. 

In both cases, there is a proper prefix of a • vr that protects exactly m t-objects. □ 



The lower bound of Theorem 20 is tight: it is matched by all progressive implementations we 



are aware of, including ones in Section 7.1 Note that any DAP single- lock STM implementation 
automatically provides a stronger progress condition than just single-lock progressiveness. A 
transaction T in a DAP single-lock STM can only be forcefully aborted if it observes a concurrent 
transaction T' such that Dset(T)riDset{T') ^ 0. This is not very far from progressiveness, where 
T may abort only if T and T' experience a write- write or write-read conflict on a t-object. Thus, 
in the realm of DAP STM implementations, progressiveness is very close to the weakest non- 
trivial progress condition. 



8 Related work 

Grain et al. [9] proved that a permissive opaque TM implementation cannot maintain invisible 
reads, which inspired the derivation of our lower bound on RAW/AWAR complexity in Section [6j 
The RAW/AWAR complexity for concurrent implementations was recently introduced in 



[6]. The proofs of Theorems 13 and 14 extend the arguments used in [6] to the STM context. 

A related paper by Attiya et al. [8j showed that every permissive strictly serializable and 
DAP TM in which every read-only transaction must commit in a wait-free manner has an 
execution in which some read-only transaction performs at least \Dset{Tk)\-l base-object 
writes. In this paper we do not assume that a read operation must be wait-free and we do not 
require disjoint-access parallelism. Also, we focus the number of RAW/AWAR patterns and 
not only base-object writes. On the other hand, we consider a stronger correctness property 
(opacity). Therefore, our lower bound in Section [g] incomparable with the one of [8]. 

To establish the lower bound on t-objects that must be "protected" in an opaque, progressive 



TM (Section 7.2 ), we use the definition of disjoint-access parallelism introduced in |8]. Guerraoui 
and Kapalka [15] considered a stronger version of DAP called strict data-partitioning to prove 
a linear lower bound on the number of steps performed by a successful read operation in a 
progressive, opaque TM that uses invisible reads. Interestingly, the constant RAW/AWAR 
implementations of progressive, opaque TMs sketched in Section [7] are strict data-partitioned. 



9 Concluding remarks 

In this paper, we derived inherent costs of implementing STMs with non-trivial concurrency 
guarantees. At a high level, our results suggest that providing high degrees of concurrency 
in STM may incur considerable unavoidable costs. Our results give rise to many intriguing 
questions, and we list some of them below. 

In this paper, we focused on progress conditions that provide positive concurrency, progres- 
siveness and permissiveness. The results do not apply to obstruction-free STMs [12] that only 
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guarantee that a transaction commits if it eventually runs without contention. Effectively, an 
obstruction-free STM provides zero concurrency, since progress is guaranteed only when one 
transaction is active at a time. However, unlike single-lock implementations, it does allow over- 
lapping transactions to make progress (one at a time). Does this incur higher RAW/AWAR 
complexity? 

We cannot expect the lower bound of Theorem 20 (the protected-data size) to apply to non- 
DAP STMs, including trivial ones that allow storing the state of the whole STM in one base 
object. One way to avoid trivialities is to assume that a base object can store information only 
about a constant number of t-objects (the constant-size information property in [13]) which 
can potentially give asymptotically close results. 

We focused on implementations that allow a tm-operation to be delayed only by concurrent 
operations performed by other transactions. Does relaxing the tm-liveness property by allowing 
a read operation to wait until a concurrent transaction terminates [7] improve the RAW/AWAR 
complexity with respect to permissive implementations? It is easy to see that the proof of our 
permissive lower bound (Theorem 13) does not work for this case. But it is unclear a priori 
how this may affect the cost of progressive implementations. 

Last but not least, the results of this paper assume opacity as a correctness property. Re- 
cently, multiple relaxations of opacity were proposed [lOl [21 [9l [8] . It would be very interesting 
to understand the concurrency benefits gained by such relaxed consistency conditions. 
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A Constant RAW/AWAR implementations for progressive TM 



This section presents the pseudo-code for single RAW and single AWAR implementations of 
progressive opaque STMs and their proofs of correctness. The single RAW implementation uses 
a multi-trylock primitive described below, while the single AWAR implementation uses a mCAS 
primitive. Finally, we describe the read-write implementation of a strongly progressive STM 
that employs at most four RAWs per transaction. In the implementations, every t-object Xj 
is associated with a distinct base object Vi that stores the "most recent" value of Xj together 
with the id of the transaction that was the last to update Xj. 

A.l Multi-trylock 



Algorithm 1 Multi-trylock invoked by process pi 



1: 


Shared variables: 






2: 


fij, for each process pi and each t-object Xj 


11: 


Tie/ease(Q) : 


3: 


acquire ^Q"): 


12: 


for all Xj € Q do 


4: 


for all Xj e Q do 


13: 


write{rij 


5: 


write(rij, 1) 


14: 


return ok 


6: 


if aXj £ Q\t ^ i : rtj = I then 






7: 


for all Xj e Q do 


15: 


isContended(X j )i 


8: 


write {rij , 0) 


16: 


if 3pt : rtj =^ 0,t ^ i then 


9: 


return false 


17: 


return true 




18: 


return false 


10: 


return true 





A multi-trylock provides exclusive write-access to a set Q of t-objects. Specifically, a multi- 
trylock exports the following operations 

• acquire(Q) returns true or false 

• release(Q) releases the lock and returns ok 

• is Contended (Xj), Xj G Q returns true or false 

Wc assume that processes are well-formed: they never invoke a new operation on the multi- 
trylock before receiving response from the previous invocation. 

We say that a process pi holds a lock on Xj after an execution tt if tt contains the invocation 
of acquire(Q), Xj E Q hy pi that returned true, but does not contain a subsequent invocation 
of release(Q' ), Xj E Q' , hy pi in tt. We say that Xj is locked after tt by process pi ii pi holds a 
lock on Xj after vr. 

We say that Xj is contended by pi after an execution tt if tt contains the invocation of 
acquire (Q), Xj G Q, by pi but does not contain a subsequent return false or return of release (Q' ), 
Xj eQ',hy Pi in tt. 

Let an execution vr contain the invocation iop of an operation op followed by a corresponding 
response Top (we say that tt contains op). We say that Xj is uncontended (resp., locked) during 
the execution of op in tt if Xj is uncontended (resp., locked) after every prefix of tt that contains 
iop but does not contain rop- 

A multi-trylock implementation satisfies the following properties: 

• Mutual-exclusion: For any object Xj, and any execution tt, there exists at most one 
process that holds a lock on Xj after tt. 
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• Progress: Let tt be any execution that contains acquire{Q) by process pi. If no object in Q 
is contended during the execution of acquire{Q) by a process pk 7^ pi in vr then acquire{Q) 
returns true in tt. 

• Let IT be any execution that contains isContended(Xj) invoked by pi. 

— If Xj is locked by pfjij^i during the complete execution of isContended{Xj) in it, 
then isContended{Xj) returns true. 

— ^ i, Xj is never contended by pi during the complete execution of isContended{Xj) 
in vr, then isContended{Xj) returns false. 

Note that liXj is neither locked or uncontended during the complete execution of isContended{Xj) , 
then either of true and false can be returned. 

Theorem 21 Algorithm^is an implementation of multi-trylock object in which every operation 
is wait-free, every operation incurs at most one multi-RAW, and isContended involves no base- 
object writes 

Proof. Denote by L the shared object implemented by Algorithm [TJ The operations exported 
by L are wait-free i.e. every operation returns a value to the invoking process after a finite 
number of its own steps. This follows from the fact that the implementation of acquire, release 
and isContended described by Algorithm [T] contains no unbounded loops or waiting statements. 

Assume, by contradiction, that L does not provide mutual-exclusion: there exists an exe- 
cution vr after which processes pi and pk hold a lock on the same object, say Xj. In order to 
hold the lock on Xj, process pi writes 1 to register rjj and then checks if any other process 
Pk has written 1 to r^j. Since the corresponding operation acquire(Q), Xj G Q invoked by pi 
returns true, pi read in r^j in Line [6| But then pk also writes 1 to r^j and later finds that rij 
is 1. This is because pk can write 1 to r^j only after the read of r^j returned to pi which is 
preceded by the write of 1 to r^j. Hence, there exists an object Xj such that rij = l;i ^ k, but 
the conditional in Line [6] returns true to process pk — a contradiction. 

L also ensures progress. This is trivial since some process pi wishing to hold a lock on Xj 
in an execution vr invokes acquire(Q), Xj G Q which writes 1 to register r^j and then checks 
if any other process pk has written to register r^j. If no other process contends on Xj during 
the execution of acquire(Q), the conditional on Line [6] returns true and respectively, acquire(Q) 
must return true. 

Let vr be any execution that contains isContended{Xj) executed by pi. If no process contends 
on Xj during the execution of isContended{Xj) in vr, pi finds in rtj = 0, Vt and the conditional 



in Line 27 returns false. However, if Xj is locked during the execution of isContended{Xj) in vr. 



at any point of the execution there exists t such that rtj = 1. Thus, the conditional in Line 27 
returns true and, respectively, isContended{Xj) must return true. 

The implementation of isContended{Xj) only reads base objects. The implementation of 
acquire{Q) first writes to a series of base objects and then reads a series of base objects incurring 
a single multi-RAW. The implementation of release(Q) only writes to base objects. □ 



A. 2 Progressive implementation with single multi-RAW 

Algorithm [2] describes the algorithms for tm-operations of a progressive opaque STM incurring 
at most a single multi-RAW per transaction. 
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Algorithm 2 Progressive STM with one muhi-RAW: the implementation of executed by pi 



Shared variables: 

Vj , for each t-object Xj 
L, a multi-trylock object 

rea(ifc{Xj); 

ovj :— read(vj) 

Rset{Tk) ■- Rset{Tk) U {Xj} 

if isAbortable() then 

return Ak 
return the value of ovj 



writef^ (Xj . v): 

if Xj ^ Wset{Tk) then 
nvj :— V 

Wset{Tk) ■- Wset{Tk) U {X^} 
return okk 



15: tryAkO-- 

16: return 



if \ Wset{Tu)\ =0 then 

return Ck 
locked := L.acquire{Wset{Tk)) 
if not locked then 

return Ak 
if isAbortable() then 

L.release{Wset(Tk)) 

return Ak 
for all Xj G Wset{Tk) do 

write{vj, {nvj, k)) 

L.release{Wset{Tk)) 
return Ck 

Function: isAbortable() : 

if 3Xj e RsetiTk) : 

return true 
if islnvalid() then 

return true 
return false 



L.isContended{Xj) then 



Function: islnvalid(): 

if aXj G Rset{Tk):ovj / read{vj) 

return true 
return false 



then 



Each time a transaction performs a read of a t-object Aj, it reads Vi, adds Aj to its read 
set and checks if the t-objects in the current read set of have not been updated since has 
read them and additionally checks if no object in the current read set is locked by an updating 
transaction. If some object in the read set has been modified or is locked, the transaction is 
forcefully aborted. Otherwise, Tj. returns the value read in Vi. 

Each time performs a write to a t-object Aj, it adds Aj to its write set and returns ok. 



The implementation of tryCf^Q) uses the multi-trylock primitive described in Section A.l 
For every updating transaction T^, tryCj^{) invokes L.acquire{Wset{Tk)), where L denotes the 
multi-trylock implemented by Algorithm [T] If it returns true, tryC^O returns C^, otherwise it 
returns A^. Read-only transactions simply returns Ck- 

A. 2.1 Proof of opacity 

Let E by any execution of the TM implemented by Algorithm [2} Recall that we assume every 
t-object was initialized by some fictitious committed transaction Tq that precedes E. Let <e 
denote a total-order on events in E. 

Linearization points. Let H denote a linearization of E\tm constructed by selecting lin- 
earization points of tm-operations performed in E\tm- The linearization point of a tm-operation 
op, denoted as iop is associated with a base object event or a tm-event performed during the 
lifetime of op using the following procedure. 

First, we obtain a completion of E\tm by removing some pending invocations and adding 
responses to the remaining pending invocations involving a transaction Tk as follows: 

• Every incomplete readk, writck or tryAj^ operation is removed from E\tm 



20 



• For every pending tryCk, if some base object Vj was written (Line 12), the response Ck 
is added to the end of E\tm, else Ak is added to the end of E\tm 

Now a hnearization H of E\tm is obtained by associating hnearization points to tm-operations 
in the obtained completion of E\tm as follows: 

• For every tm-read op^ that returns a non-A^ value, lop^ is chosen as the event in Line[5] 
of Algorithm [2| else, (.op^. is chosen as invocation event of op^ 

• For every tm-write or tm-abort opk that returns, lop^. is chosen as the invocation event of 

OPk 

• For every op^ = tryCk that returns such that Wset{Tig) ^ 0, ^op^ is associated with the 
successful acquisition of the lock on Wset{Tk) (at the end of Line[7|, else if opk returns 
^fc) ^opk is associated with the invocation event of opk 

• For every opk = tryCk that returns Ck such that Wset{Tk) = 0, lop^. is associated with 
Line [6] 

<H denotes a total-order on tm-operations in the complete sequential history H. 



Serialization points. The serialization of a transaction Tj, denoted as is associated with 
the linearization point of a tm-operation performed within the lifetime of the transaction. 
We obtain a t-complete history H from H as follows: 

• For every transaction Tk in H that is live, we insert tryC^ ■ immediately after the last 
event of in H. 

• For every aborted transaction in H, we remove each write operation in with the 
matching response 

H is thus a t-complete sequential history that contains only updating committed transactions 
and read-only transactions since every aborted transaction is reduced to its read-prefix. A 
serialization S is obtained by associating serialization points to transactions in H as follows: 

• If Tfc is an update transaction that commits, then St,, is ^tryCk 

• If Tk is a read-only or aborted transaction, then St^. is assigned to the linearization point 
of the last tm-read that returned a non-A^ value in Tj. 

<S denotes a total-order on transactions in the t-sequential history S. 

Lemma 22 If Ti Tj, then Ti <s Tj 

Proof. Recall that Tj precedes Tj in the deferred-update order if there exists X £ Rset(Ti) n 
Wset{Tj), Tj has committed, such that the response of readi{X) precedes the invocation of 
tryCjO in H. Thus, £read,{x) <E itryCj- 

Consider the histories depicted in Figure [2] where Tj precedes Tj in the deferred-update order 
{tryCk{Xj) denotes a tryCk such that Xj G WsetiTk)). 

(1) Consider the history depicted in Figure [2|^A) where T is a read-only transaction and Tj 
is an updating transaction that returns Cj. Assume the contrary that Tj Tj, but 

Tj <s Ti, which implies that <e ^Ti i-e. ^tryCj(x) precedes the linearization point of 
the last tm-read in Tj that returns a non-Aj value (say readi{X')). Thus, successful lock 
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(A) 



(B) 





Figure 2: Assignment of serialization points respects the deferred- update order 



acquisition on X by Tj in Line [7] precedes the read of the base object associated with X' 
by Tj in Line[5j 

readi{X') checks if any object in Rset{Ti) is locked by a concurrent transaction, then 
performs read- validation (Line [7]). Consider the following possible sequence of events: Tj 
acquires the lock on X, updates X to shared-memory, Tj reads the base object associated 
with X' , Tj releases the lock and finally Tj performs the check in Line [t] readi{X') is 
forced to return Ai because X has been invalidated. 

Else if Tj acquires the lock on X, updates X to shared- memory, Tj reads the base object 
associated with X', Tj performs the check in Line [t] and finally Tj releases the lock on X. 
Again, readi{X') returns Ai since Tj is holding a lock on X ^ Rset(Ti) — contradiction. 

Hence, the only possibility is that the last successful tm-read {readi{X')) in Tj is linearized 
before tryCj{X), which implies that J-p. <e Stj- 



(2) Suppose that Tj is an updating transaction as shown in Figure [2]^B), then 



and 



hryCj{Xx) ^^^6 assigned to Line [7] of Algorithm [2] when the locks are acquired on X2 and 
Xi respectively. Assume the contrary that Tj precedes Tj in deferred-update order, but 
^Tj <E ^Tii then ItryCj < E ^tryd ■ A similar argument to the above leads to a contradiction 
since tryC performs the same sequence of checks as the tm-read (Linejs]). 

□ 



Lemma 23 IfTi^n Tj, then Ti <s Tj 

Proof. This follows from the fact that for a given transaction, its serialization point is chosen 
within the lifetime of the transaction implying if Tj Tj, then <e Stj =^ T <s Tj □ 



Lemma 24 IfTi^^Tj, then Ti <s Tj 

Proof. Assume the contrary, i.e. there exists a readj{X), X G Rset{Tj) n Wset{Ti) that 
returns the value of X updated in writei{X, value) and Tj <s Ti. Ti is an updating committing 
transaction, hence 6t^ = itryd ■ 
Consider two cases: 

(1) Suppose that Tj is a read-only transaction. Thus, is assigned to the last tm-read 
that returns a non-Aj value (say readj{X')), whose linearization point precedes itryCi- 
This implies that the read of the base object associated with X' by Tj in Line [5] precedes 
the successful lock acquisition on X by Tj in Line [7j Thus, the write to the base object 
associated with X performed by tryCi{) in line [l2] is executed after the read of the base 
object performed by readj{X) in Linep] — a contradiction. 
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(2) Suppose that Tj is an updating transaction. Then, itryCj <E ^tryd- Again, this imphes 
that the read of the base object in Line [5] executed by readj{X) precedes to the write to 
the base object performed by tryC^Q) — a contradiction. 



□ 



Lemma 25 S is legal 

Proof. Recall that S is legal if every tm-read of an object X performed by a transaction Tj 
returns the response of the latest value written to X in S. Since we only consider canonic 
transactions, the latest value written to X in 5 is the value written by the last transaction Tj 
such that Tfc commits, Tj <s Ti and X G Wset{Tj^ 
From Lemma 



24 



we have that for all Tj and Tj, if Ti^ Tj, then Tj precedes Tj in S. Thus, to 



prove that S is legal, it is enough to show that if Ti^^Tj, then there does not exist a transaction 
Tfc that returns Cfc, X G Wset{Tk) such that Tj <s T^ <s Tj. 
Assume the contrary that 

• 3Tfc, X G Wset{Tk), returns Ck such that Tj <s T^ <$ Tj 
Ti and T^ are both updating transactions that commit. Thus, 

(T <sTk) ^ {6t, <e6t,) 

{5t, <E 5tJ <S=^ i^tryC, <E itryCk) 

Since, Tj reads the value of X written by Tj, one of the following is true 

^tryCi <E ^tryCk <E ^readj(X) (or) 
^tryC, <E ireadj{X) <E ^tryC'k 

If ^tryCk < E ^readj (x) ; then the successful lock acquisition on X by T^ in Line [7] precedes the 
read of the base object associated with X by Tj in Line [5] 

readj{X) checks if any object in Rset{Tj) is locked by a concurrent transaction, then performs 
read- validation (Line]?]). Consider the following possible sequence of events: T^ acquires the 
lock on X, updates X to shared-memory, Tj reads the base object associated with X, T^ releases 
the lock and finally Tj performs the check in Line[7j readj{X) is forced to return Aj because 
X G Rset{Tj) (Line [6]) and has been invalidated since last reading its value. 

Else if, Tfc acquires the lock on X, updates X to shared- memory, Tj reads the base object 
associated with X, Tj performs the check in Line[7|and finally T^ releases the lock on X. Again, 
readj{X) returns Aj since T^ is holding a lock on X G Rset(Tj) — contradiction. 

Thus, Ireadj(X) <E ^tryCk- 

Consider two cases: 

(1) Suppose that Tj is a read-only transaction. Then, is assigned to the last tm-read 
performed by Tj that returns a non-Aj value. If readj{X) is not the last tm-read that 
returned a non-Aj value, then there exists a readj{X') such that 

^readj{X) <E ^tryCk <E ireadj(X') 

(2) Suppose that Tj is an updating transaction that commits, then St^ = itryCj which implies 
that 
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readj(X) <E ^tryC\ <E ^tryCj 



The same argument derived in the proof of Lemma 22 shows that both cases lead to a contra- 
diction, i.e., both readj{X') and tryCj are forced to return Aj — contradiction. □ 



Lemma 26 Algorithm^ implements a progressive TM 

Proof. Every transaction in a TM M whose tm-operations are defined by Algorithm [2] can 
be aborted in the following scenarios 

• Read-validation failed in readk or tryCk 

• readk or tryCk returned Ak because Xj G Rset(Tk) is locked (belongs to write set of a 
concurrent transaction) 

• L.acquire(Wset{Tk) ) returned false in Line 21 of Algorithm [2] 

Read- validation consists of checking whether the value to be returned from a tm-read of transac- 
tion Tfc is consistent with the values returned from the previous tm-reads. Hence, if validation of 
a tm-read in fails, it means that the t-object is overwritten by some transaction Tj such that 
Tj <5 Tfc, implying a read-write conflict. This is also implied if some t-object Xj £ Rset{Tk) is 
locked and returns abort since the t-object is in the write set of a concurrent transaction. 

Acquisition of the multi-trylock can return false for Tj because there exists some Xj G 
Wset{Ti) that was being written to by a concurrent transaction implying a write-write 
conflict. 

Hence, for every transaction Ti G H that is aborted, there exists a conflicting t-object that 
is contended by a concurrent transaction. Thus, Algorithm [2] implements a progressive TM □ 



Theorem 16 There exists a progressive opaque STM implementation that employs a single 



multi-RAW per transaction. Moreover, no RAWs are performed by read-only transactions. 



Proof. From Lemmas 22, 23, 25 and 26, Algorithm [2] implements a progressive, opaque STM. 

Any process executing a transaction holds the lock on Wset{Tk) only once during tryCk- 
If I Wset{Tk)\ = 0, then the transaction simply returns incurring no RAW's. Thus, from 
Theorem |21[ Algorithm [2] incurs a single multi-RAW per updating transaction and no RAW's 
are performed in read-only transactions. □ 



A. 3 Progressive implementation with single mCAS 

Algorithm|3]describes the implementation of a progressive, opaque TM incurring a single AWAR 
per updating committed transaction. The implementations of reads and writes are similar to 
ones described in Section A. 2 except that each time a transaction T^, performs a read of a t- 
object Xi, it reads Vi, adds Xi to its read set and checks if the t-objects in the current read set 
of Tfc have not been updated since has read them. If this is not the case, the transaction is 
forcefully aborted. Otherwise, returns the value read in Vi. 

For every updating transaction Tk, tryCf.{) invokes the mCAS primitive over Dset(Tk). If 
the mCAS returns true, tryC^O returns C^, otherwise it returns A^. Read-only transactions 
simply returns Ck- 
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Algorithm 3 Progressive STM with single mCAS; implementation of transaction by process 



Pi 



1 


Shared variables: 






2 


Vj , for each t-object Xj 










17 




3 


rmdfc(Xj): 


18 


if Wset{Tk) = then 


4 


Olij :— read{vj) 


19 


return Ck 


5 


Rset{Tk) := Rset{Tk) U {Xj} 


20 


for all Xj G Wset(Tk) do 


6 


nvj := oiij 


21 


ovj := reaa(vj) 


7 


if islnvalid() then 


22 


Let W^set(rfc) U Rset{Tk) be {X^^, Xi^} 


8 


return Ak 


23 




9 


return the value of ovj 


24 


OV = {o«ii,...,oui„J 






25 


A''V = {ni^ii, ...,nwi„^} 


10 


writer {X j ,v)i 


26 


if mCAS(V,OV,NV) then 


11 


if Xj ^ Wset{Tk) then 


27 


return Ck 


12 


nvj :— V 


28 


return Ak 


13 


Wset{Tk) ■- Wset{Tk)VJ{Xj} 






14 


return okk 


29 


Function: islnvalid(): 






30 


if 3Xj € Rset{Tk):ovj 7^ read(vj) then 


15 




31 


return true 


16 


return Ak 


32 


return false 



A. 3.1 Proof of opacity 

Using the same notation as in proof of opacity for Algorithm [2] in Section A. 2.1, let E' denote 
an execution of the TM implemented by Algorithm [s] and H\ a linearization of the execution 
history E'\tm- We construct H' by assigning linearization points to tm-operations performed 
in completion of E'\tm- 

The linearization point of a tm-operation opk performed by transaction Tk in a completion 
of E'\tm is associated with access of a base object or a tm-event performed during the lifetime 
of the tm-operation as follows. 

• For every tm-read opk that returns a non-A^ value, iopk is chosen as the event in Line|4] 
of Algorithm [3| else, iopk is chosen as invocation event of opk 

• For every tm- write opt that returns, iopk is chosen as the invocation event of opt 

• For every opk = tryCk that returns Ck such that Wset{Tk) / 0, lopk is associated with 
the successful acquisition of the lock on Wset{Tk) (Line 26), else if opk returns Ak, iopk is 
associated with the invocation event of opk 

• For every opk = tryCk that returns Ck such that Wset{Tk) = 0, iopk is associated with 
Line [19] 



The t-sequential history 5" is constructed in same manner as described in Section A. 2.1 from 
the above assignment for linearization points. Note that the Lemmas proven for Algorithm [2] 
are clearly also valid for Algorithm [3} 

Theorem 27 There exists a progressive opaque STM implementation that employs exactly one 
AWAR per transaction. Moreover, no AWARs are performed in read-only or aborted transac- 
tions. 

Proof sketch. Clearly, Algorithm [3] implements an opaque STM. 
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Algorithm [3] is progressive since every transaction forcefully aborts either due to read- 
invalidation or because mCAS returns false implying that there exists a conflicting t-object 
contended by a concurrent transaction. Also note that, if several transactions concurrently 
conflict on a single t-object, the first transaction to execute the mCAS in Line [26] is returned 
true and commits. Thus, the implementation guarantees that in any set of concurrent conflict- 
ing transactions, at least one of the transactions commits which actually provides a stronger 
progress guarantee than progressiveness or even strong progressiveness. Indeed, a transaction 
Tfc can abort only if a concurrent committed transaction modifies the value of Vj for some 
Xj G Dset{Tk). 

Algorithm [3] performs a single mCAS operation on Dset{Tk) of a transaction that com- 
mits during tryCk', if Tj. aborts, the mCAS only performs reads of base objects. For read-only 
transactions, the transaction simply returns incurring no AWAR. □ 



A. 4 Starvation-free multi-trylock 



In this section, we define a multi-trylock object analogous to the one defined in Section A.l 
but whose operations are starvation-free. The algorithm is inspired by the Black- White Bakery 
Algorithm [25] and uses a finite number of bounded registers. 

The algorithm uses the following shared variables: registers rij for each process pi and object 
Xj, a shared bit color G {B, W}, registers LAi G {0, . . . , N} for each pi that denote a Label and 
MCi G {B,W} for each pi. 

We say {LAi, i) < {LAk, k) iff LAi < LAk or LAi = LAk and i < k. 

A starvation-free multi-trylock implementation satisfies the following properties: 

• Mutual- exclusion: For any object Xj, and any execution vr, there exists at most one 
process that holds a lock on Xj after vr. 

• Progress: Let vr be any execution that contains acquire{Q) by process pi. If no other 
process pk,k ^ i contends infinitely long on some Xj G Q, then acquire{Q) returns true 
in vr. 

• Let vr be any execution that contains isContended(Xj) invoked by pi. 

— If Xj is locked by pi;t i during the complete execution of isContended(Xj) in vr, 
then isContended{Xj) returns true. 

— If / i, Xj is never contended by pi during the execution of isContended{Xj) in vr, 
then isContended{Xj) returns false. 

Lemma 28 In every execution vr of Algorithm^ if pi holds a lock on some object Xj after tt, 
then one of the following conditions must hold: 

(1) for some k / i; LAk i= 0, if MCk = Md, then {LAk,k) > {LAi,i) 

(2) for some k / i; LAk / 0, if MCk / Md, then Md / color 

Proof. In order to hold the lock on Xj, some process pi writes 1 to rij, writes a value, say W 
to MCi and reads the Labels of other processes that have obtained the same color as itself and 



generates a Label greater by one than the maximum Label read (Line 11). Observe that until 
the value of the color bit is changed, all processes read the same value W . The first process pi 
to hold the lock on Xj changes the color bit to B when releasing the lock and hence the value 
read by all subsequent processes will be B until it is changed again. Now consider two cases: 
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Algorithm 4 Starvation-free multi-trylock invoked by process pi 

Shared variables: 

LAi, for each process pi, initially 
MCi e {B, W} for each process pi, initially W 
color G {B, W}, initally W 

rij, for each process Pi and each t-object Xj, initially 

acquire (^Q) : 

for all Xj G g do 

write{rij , 1) 
Ci :— ml or 
write(MCi,Ci) 

write{LAi, 1 + max{{LAk)\MCk = MC^}) 

while 3j -.Jkj^i: isContended{Xj) && {{LAk / 0; {MCk = Md); {LAk,k) < {LAi,i)) 
{LAk / 0; {MCk + MCi); MC^ = color)) do 

no op 
end while 
return true 

release(Q): 

for all Xj eQ do 

write{rij, 0) 

if MCi = B then 

write{color, W) 
else 

write{color, B) 
wnte{LAi, 0) 
return ok 

isContended(X j ): 

if 3pt : rtj ^Q,t ^ i then 

return true 
return false 



(1) Assume that there exists a process pk, k ^ i, LA^ ^ and MC^ = MCi such that 
(LAk,k) < {LAi,i), but pi holds a lock on Xj after vr. Thus, isContended{Xj) returns 



true to Pi because pk writes to rkj (Line|8j) before writing to LAk (Line 11 ) . By assumption, 
{LAk,k) < {LAi, i); LAk > and Md = MCk, but the conditional in Line 13 returned 
true to Pi without waiting for pk to stop contending on Xj — contradiction. 

(2) Assume that there exists a process pk, k ^ i, LAk 7^ and MCk 7^ Md such that 
MCi = color, but pi holds a lock on Xj after vr. Again, since LAk > 0, isContended{Xj) 
returns true to pi, MCk / and MCi = color, but the conditional in Line [T3| returned 
true to Pi without waiting for pk to stop contending on Xj — contradiction. 

□ 



Theorem 29 Algorithm^is an implementation of multi-trylock object in which every operation 
is starvation-free and incurs at most four RAWs. 

Proof. Denote by L the shared object implemented by Algorithm |4j 

Assume, by contradiction, that L does not provide mutual-exclusion: there exists an execu- 
tion vr after which processes pi and pk, k ^ i hold a lock on the same object, say Xj. Since both 
Pi and Pk have performed the write to LAi and LAk resp. in Line |11[ LAi, LAk > 0. Consider 
two cases: 
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(1) If MCk = MCi, then from Condition 1 of Lemma 28 we have (L^^, A;) < {LAi,i) and 
{LAk,k) > {LAi,i) — contradiction. 



(2) If MCk / MCi, then from Condition 2 of Lemma [28| we have Md ^ color and MCk / 
color which implies MCk = MCi — contradiction. 

L also ensures progress. If process pi wants to hold the lock on an object Xj i.e. invokes 
acquire{Q), Xj £ Q, it checks if any other process pk holds the lock on Xj. If such a process 
Pk exists and MCk = Md, then clearly isContended{Xj) returns true for pi and (LAk,k) < 
{LAi,i). Thus, Pi fails the conditional in Line [Ts] and waits until pk releases the lock on 
Xj to return true. However, if pk contends infinitely long on Xj, pi is also forced to wait 
indefinitely to be returned true from the invocation of acquire{Q). The same argument works 
when MCk MCi since when pk stops contending on Xj, isContended{Xj) eventually returns 
false for pi if pk does not contend infinitely long on Xj . 

All operations performed by L are starvation-free. Each process pi that successfully holds the 
lock on an object Xj in an execution vr invokes acquire{Q), Xj G Q, obtains a color and chooses a 
value for LAi since there is no way to be blocked while writing to LAi. The response of operation 
acquire{Q) by pi is only delayed if there exists a concurrent invocation of acquire{Q'), Xj £ Q' 
by Pk in vr. In that case, process pi waits until pk invokes release{Q) and writes to rkj and 
eventually holds the lock on Xj. The implementation of release and isContended are wait- 
free operations (and hence starvation-free) since they contains no unbounded loops or waiting 
statements. 

The implementation of isContended{Xj) only reads base objects. The implementation of 



release{Q) writes to a series of base objects (Line 18) and then reads a base object (Line 20) 
incurring a single RAW. The implementation of acquire{Q) writes to base objects (Linejsj), reads 
the shared bit color (Line [9]) — one RAW, writes to a base object (Line 10), reads the Labels 



(Line 11) — one RAW, writes to its own Label and finally performs a sequence of reads when 
evaluating the conditional in Line [13] — one RAW. 

Thus, Algorithm [4] incurs at most four RAWs. □ 



A. 5 Strong progressive implementation with constant RAWs 

Let CObjuiTi) denote the set of t-objects over which transaction Tj G parts{H) conflicts 
with any other transaction in history H i.e. X G CObjuiTi), if there exists a transaction 
Tk £ parts{H), k ^ i, such that Tj conflicts with Tk on X in H. Then, CObjniQ) = 
{CObjH{Ti)\\/Ti G Q}, denotes the union of sets CObjH(Ti) for all transactions in Q. 

Let CTrans{H) denote the set of non-empty subsets of parts{H) such that a set Q is in 
CTrans{H) if no transaction in Q conflicts with a transaction not in Q. 

Definition 30 A TM implementation M is strongly progressive if M is weakly progressive and 
for any history H of M , there does not exist a prefix H' of H in which every set Q G CTrans{H') 
of transactions that are live in H' such that \ COhj}{'{Q)\ < 1; every transaction in Q is forcefully 
aborted in H . 

Algorithm [5] describes the implementation of the tryC operation of a strongly progressive, 
opaque TM. The only modification over the tryC implementation of Algorithm [2] is that in 
Algorithm [sj every transaction with |i?sei| = eventually commits. The read, write, try A and 
isAbortable operations are the same as in Algorithm [2j 

Theorem 31 Algorithm^ implements a strongly progressive TM 
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Algorithm 5 Strongly progressive, opaque STM: the implementation of executed by pi 



1: Shared variables: 

2: Vj, for each t-object Xj 

3: L, a starvation-free multi-trylock object 

4: tryCkO- 

5: if |Wset(Tfc)l =0 then 
return Ck 
loclied ~ L.acquire{Wset{Tk)) 



if isAbortable() then 
L.release{Wset(Tk)) 
return Ak 

for all Xj e Wset{Tk) 
write{vj, {nvj, k)) 

L.release{Wset{Tk)) 
return C'k 



do 



Proof. Every transaction in a TM M whose tm-operations are defined by Algorithm [5] can 
be aborted in the following scenarios 

• Read-validation failed in readk or tryCt 

• readk or tryC^ returned Aj. because Xj € Rset{Tk) is locked (belongs to write set of a 
concurrent transaction) 

Thus, Algorithm [5] implements a weakly progressive TM (From Lemma 26). 

To show Algorithm [5] also implements a strongly progressive STM, we need to show that 
for every set of transactions that concurrently contend on a single t-object, at least one of the 
transactions is not aborted. 

Consider transactions Tj and that concurrently attempt to execute tryCi and tryCk such 
that Xj G Wseti U Wsetk- Consequently, they both invoke the acquire operation of the multi- 
trylock (Line[7| and thus, from Theorem 29, both Tj and must commit eventually. Also, if 



validation of a tm-read in Tk fails, it means that the t-object is overwritten by some transaction 
Tj such that Tj precedes Tk, implying at least one of the transactions commit. Otherwise, if 
some t-object Xj G Rset[Tk) is locked and returns abort since the t-object is in the write set 
of a concurrent transaction Tj. While it may still be possible that Tj returns Ai after acquiring 
the lock on Wseti, strong progressiveness only guarantees progress for transactions that conflict 
on at most one t-object. Thus, in either case, for every set of transactions that conflict on at 
most one t-object, at least one transaction is not forcefully aborted. □ 



Theorem 17 There exists a strongly progressive single- version opaque STM implementation 
with starvation-free operations that uses invisible reads and employs at most four RAWs per 
transaction. Moreover, no RAWs are performed in read-only transactions. 

Proof. The correctness of Algorithm [5] clearly follows from the proof of opacity presented 
in Section A. 2.1 for Algorithm [2] From Theorem 31, it is also strongly progressive. 



Any process executing a transaction Tj. holds the lock on Wset{Tk) only once during tryC^- 
If I Wset{Tk)\ = 0, then the transaction simply returns Ck incurring no RAW's. Thus, from 
Theorem |29[ Algorithm [5] incurs at most four RAWs per updating transaction and no RAW's 
are performed in read-only transactions. □ 



B RAW/AWAR cost of probabilistically permissive STMs 

Theorem 32 Let M he a probabilistically permissive opaque STM implementation. Then, for 
any m, there exists with positive probability, an execution in which a read-only transaction Ti 
contains 0,{m) non- overlapping RAWs or AWARs on base objects where m = \Rset{Ti)\. 
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Proof. For the proof, note that we only need to show that there exists an execution of the 
probabiUstically permissive TM that is the same as the execution of a permissive TM, Then, 



the construction and arguments used in the proof of Theorem 13 can be extended for the 
probabihstic case. 

Let E denote the execution depicted in Figure [T] where T3 performs a read of Xi, then T2 
performs a write on Xi and commits, and finally Ti performs a series of reads on Xi, . . . , X^. 
We proceed by induction by considering Ri{Xk), the k-th read of Ti, 2 < k < m. 

(1) Imagine an extension of E, denoted by E' , in which T3 performs a W^^Xk) immediately 
after Ri^Xf^) and then tries to commit. A serialization of H' = E'\tm should obey 
Ts ^I^F T2 and T2 ^h' Ti. The execution of Ri{Xk) does not modify base objects, hence, 
T3 does not observe Ri{Xk) in E' . In a probabilistically permissive TM, the tm-operation 
W3{Tk) can return one of the following values or okk- Note that this response is chosen 
by sampling uniformly at random from the set of possible return values, thus, there exists 
a positive probability that T3 commits successfully (when it returns okk)- But since Ti 
performs Ri{Xk) before T3 commits and T3 updates Xk, we also have Ti -K^V T3. Thus, 
T3 cannot precede Ti in any serialization and we establish a contradiction. Consequently, 
there exists with positive probability, an execution in which each Ri{Xk), 2 < k < m 
performs a write to a base-object. 

(2) Let TT be a fragment of E that represents the complete execution of Ri{Xk)- Clearly, there 
exists with positive probability, an execution in which vr contains a write to a base-object. 
Let TTj be the first write to a base-object in vr and vr^, the shortest fragment of vr that 
contains the atomic section to which iTj belongs, else if ttj is not part of an atomic section, 
T^w = TTj. Thus, vr can be represented as vr^ • vr^ • vr/. 

Suppose that vr does not contain a RAW or AWAR. Since vr^ does not contain an AWAR 
(atomic write-after-read), there are no read events in vr^ that precede vr^. Thus, vr^ is 
the first base-object event in vr^. Consider the execution fragment vr^ • p, where p is 
the complete execution of {Ws{Xk), TC3} by transaction T3. By Definition [9| such an 
execution exists with positive probability in which T3 commits. Since vr^ does not perform 
any base-object write, vr^ • p is indistinguishable to from p. 

Also, by our assumption, vr^^, • ttj contains no RAW i.e. any read performed in vr^ • vrj can 
only be applied to base objects previously written in vr^ • vr/. Thus, in a probabilistically 
permissive TM in which responses to tm-operations are chosen by independent coin-tosses, 
there exists with positive probability, an execution vr^ • p • vr,^ • vr/ that is indistinguishable 
to Ti from vr. However, in vr^ • p • vr^ • vr/, T3 commits (as in p) but Ti ignores the value 
written by T3 to X^- But T3 can only be serialized before Ti — contradiction. 

□ 
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